Multi-document summarization using closed patterns
نویسندگان
چکیده
There are two main categories of multi-document summarization: term-based and ontology-based methods. A term-based method cannot deal with the problems of polysemy and synonymy. An ontology-based approach addresses such problems by taking into account of the semantic information of document content, but the construction of ontology requires lots of manpower. To overcome these open problems, this paper presents a pattern-based model for generic multi-document summarization, which exploits closed patterns to extract the most salient sentences from a document collection and reduce redundancy in the summary. Our method calculates the weight of each sentence of a document collection by accumulating the weights of its covering closed patterns with respect to this sentence, and iteratively selects one sentence that owns the highest weight and less similarity to the previously selected sentences, until reaching the length limitation. The sentence weight calculation by patterns reduces the dimension and captures more relevant information. Our method combines the advantages of the term-based and ontology-based models while avoiding their weaknesses. Empirical studies on the benchmark DUC2004 datasets demonstrate that our pattern-based method significantly outperforms the state-of-the-art methods. Multi-document summarization can be used to extract a particular individual’s opinions in the form of closed patterns, from this individual’s documents shared in social networks, hence provides a useful tool for further analyzing the individual’s behavior and influence in group activities. © 2016 Elsevier B.V. All rights reserved.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملEntity type modeling for multi-document summarization : generating descriptive summaries of geo-located entities
In this work we investigate the application of entity type models in extractive multi-document summarization using the automatic caption generation for images of geo-located entities (e.g. Westminster Abbey, Loch Ness, Eiffel Tower) as an application scenario. Entity type models contain sets of patterns aiming to capture the ways the geo-located entities are described in natural language. They ...
متن کاملMulti-Document Summarization Using Document Set Type Classification
In this paper, we propose a summarization system which automatically classifies type of document set and summarizes a document set with its appropriate summarization mechanism. This system will classify a document set into three types: (a) One topic type, (b) multi-topic type, and (c) others. These types will be identified using information of high frequency nouns and Named Entity. In our multi...
متن کاملA Generative Approach for Multi-Document Summarization using the Noisy Channel Model
Multi-document summarization is the automatic production of a unique summary from a collection of texts. This task has become very important, since it assists the information processing in days where the amount of information is growing considerably. In this paper, we propose a statistical generative approach for multi-document summarization. In particular, we formulate the multi-document summa...
متن کاملMulti-Document Arabic Summarization Using Text Clustering to Reduce Redundancy
“The process of multi-document summarization is producing a single summary of a collection of related documents. In this work we focus on generic extractive Arabic multi-document summarizers. We also describe the cluster approach for multi-document summarization. The problem with multi-document text summarization is redundancy of sentences, and thus, redundancy must be eliminated to ensure cohe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 99 شماره
صفحات -
تاریخ انتشار 2016